Image Classification via Neural Networks: Dogs vs Cats

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import os, shutil

from keras.datasets import imdb
from keras import models, layers, optimizers
from keras.preprocessing.image import ImageDataGenerator
Using TensorFlow backend.

1. Data

1. Creating the Data

From the original Kaggle dataset consisting of 50,000 images, we use create a new training set of 1,000 samples each for dogs and cats, a validation set with 500 each and a test set with 500 each.

In [2]:
original_dataset_dir = '/home/d869321/Data/CatsVsDogs'
base_dir = '/home/d869321/Data/CatsVsDogsSmall'
os.mkdir(base_dir)

# create directories for the training, validation and test splits
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)

validation_cats_dir = os.path.join(validation_dir, 'cats')
os.mkdir(validation_cats_dir)
validation_dogs_dir = os.path.join(validation_dir, 'dogs')
os.mkdir(validation_dogs_dir)

test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)

fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_cats_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_cats_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_cats_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil.copyfile(src, dst)

As a sanity check, let us count how many pictures are in each directory.

In [3]:
print('total training cat images:', len(os.listdir(train_cats_dir)))
print('total training dog images:', len(os.listdir(train_dogs_dir)))
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
print('total test cat images:', len(os.listdir(test_cats_dir)))
print('total test dog images:', len(os.listdir(test_dogs_dir)))
total training cat images: 1000
total training dog images: 1000
total validation cat images: 500
total validation dog images: 500
total test cat images: 500
total test dog images: 500

1.1. Data Manipulation

We need to convert all the jpegs into tensors.

In [4]:
# these three lines are repeated incase you don't want to reconstruct the small Images folder
base_dir = '/home/d869321/Data/CatsVsDogsSmall'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

train_datagen = ImageDataGenerator(rescale=1./255)  # rescale images by 1/255 so the values lie in [0, 1]
test_datagen = ImageDataGenerator(rescale=1./255)

# 2nd argument = resolution of image (incl. deformation), 4th argument = 'binary' refers to labels
train_generator = train_datagen.flow_from_directory(train_dir, target_size=(150, 150), 
                                                    batch_size=20, class_mode='binary')
validation_generator = test_datagen.flow_from_directory(validation_dir, target_size=(150, 150), 
                                                        batch_size=20, class_mode='binary')
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.

The generators yield batches of 150x150 RCB images (i.e. shape (20, 150, 150, 3)) and binary labels (shape (20,)). If the input images are not square, they will be deformed to align with the 150x150 specification.

In [5]:
from IPython.display import display
from PIL import Image

for data_batch, labels_batch in train_generator:
    print('data batch shape:', data_batch.shape)
    print('labels batch shape:', labels_batch.shape)
    break
data batch shape: (20, 150, 150, 3)
labels batch shape: (20,)

We can display a sample of such pictures and their labels.

In [6]:
ind = 0
i = 0
for data_batch, labels_batch in train_generator:
    print('label:', labels_batch[ind])
    print('the %s-th image is:'%ind)
    imagePet = data_batch[ind]
    plt.imshow(imagePet)
    plt.show()
    i += 1
    if i > 3:
        break
label: 0.0
the 0-th image is:
label: 0.0
the 0-th image is:
label: 1.0
the 0-th image is:
label: 0.0
the 0-th image is:

2. Building Network Architecture

The CNN will be a stack of alternated Conv2D and MaxPooling2D layers. The depth of the feature maps will increase with each layer whilst the size of the feature maps will decrease.

As this is a binary classification problem, we end with a densely connected layer with a single unit and a sigmoid acitvation.

In [7]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

We can get the network architecture.

In [8]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               3211776   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

As we need a binary classifier, we use binary crossentropy as our loss.

In [9]:
model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-4), metrics=['acc'])

3. Model Fitting

We can fit the model using the training generator defined earlier. The model accuracy can be validated using the validation generator.

In [10]:
history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=30, 
                              validation_data=validation_generator, validation_steps=50)
Epoch 1/30
100/100 [==============================] - 26s 262ms/step - loss: 0.6917 - acc: 0.5395 - val_loss: 0.6957 - val_acc: 0.6230
Epoch 2/30
100/100 [==============================] - 26s 261ms/step - loss: 0.6621 - acc: 0.6045 - val_loss: 0.5736 - val_acc: 0.6320
Epoch 3/30
100/100 [==============================] - 26s 261ms/step - loss: 0.6178 - acc: 0.6600 - val_loss: 0.6198 - val_acc: 0.6610
Epoch 4/30
100/100 [==============================] - 27s 268ms/step - loss: 0.5772 - acc: 0.7040 - val_loss: 0.4695 - val_acc: 0.6260
Epoch 5/30
100/100 [==============================] - 27s 266ms/step - loss: 0.5401 - acc: 0.7220 - val_loss: 0.5901 - val_acc: 0.6870
Epoch 6/30
100/100 [==============================] - 26s 264ms/step - loss: 0.5259 - acc: 0.7385 - val_loss: 0.5045 - val_acc: 0.6930
Epoch 7/30
100/100 [==============================] - 26s 264ms/step - loss: 0.4962 - acc: 0.7565 - val_loss: 0.5273 - val_acc: 0.6830
Epoch 8/30
100/100 [==============================] - 28s 281ms/step - loss: 0.4715 - acc: 0.7735 - val_loss: 0.5431 - val_acc: 0.7040
Epoch 9/30
100/100 [==============================] - 26s 264ms/step - loss: 0.4471 - acc: 0.7955 - val_loss: 0.6715 - val_acc: 0.7120
Epoch 10/30
100/100 [==============================] - 26s 265ms/step - loss: 0.4293 - acc: 0.8030 - val_loss: 0.3628 - val_acc: 0.6820
Epoch 11/30
100/100 [==============================] - 26s 264ms/step - loss: 0.3932 - acc: 0.8285 - val_loss: 0.7364 - val_acc: 0.7310
Epoch 12/30
100/100 [==============================] - 26s 264ms/step - loss: 0.3657 - acc: 0.8375 - val_loss: 0.6868 - val_acc: 0.7210
Epoch 13/30
100/100 [==============================] - 26s 264ms/step - loss: 0.3473 - acc: 0.8420 - val_loss: 0.4053 - val_acc: 0.7230
Epoch 14/30
100/100 [==============================] - 26s 264ms/step - loss: 0.3239 - acc: 0.8655 - val_loss: 0.6661 - val_acc: 0.7380
Epoch 15/30
100/100 [==============================] - 27s 270ms/step - loss: 0.3021 - acc: 0.8730 - val_loss: 0.3097 - val_acc: 0.7370
Epoch 16/30
100/100 [==============================] - 26s 264ms/step - loss: 0.2722 - acc: 0.8965 - val_loss: 0.5790 - val_acc: 0.7450
Epoch 17/30
100/100 [==============================] - 27s 266ms/step - loss: 0.2536 - acc: 0.9000 - val_loss: 0.9948 - val_acc: 0.7310
Epoch 18/30
100/100 [==============================] - 26s 265ms/step - loss: 0.2339 - acc: 0.9040 - val_loss: 0.4154 - val_acc: 0.7320
Epoch 19/30
100/100 [==============================] - 27s 266ms/step - loss: 0.2103 - acc: 0.9195 - val_loss: 1.2150 - val_acc: 0.7280
Epoch 20/30
100/100 [==============================] - 27s 265ms/step - loss: 0.1894 - acc: 0.9290 - val_loss: 0.6346 - val_acc: 0.7180
Epoch 21/30
100/100 [==============================] - 27s 266ms/step - loss: 0.1692 - acc: 0.9415 - val_loss: 0.2850 - val_acc: 0.7450
Epoch 22/30
100/100 [==============================] - 27s 265ms/step - loss: 0.1566 - acc: 0.9395 - val_loss: 0.5779 - val_acc: 0.7400
Epoch 23/30
100/100 [==============================] - 26s 265ms/step - loss: 0.1365 - acc: 0.9570 - val_loss: 1.6038 - val_acc: 0.7390
Epoch 24/30
100/100 [==============================] - 27s 266ms/step - loss: 0.1188 - acc: 0.9560 - val_loss: 0.5485 - val_acc: 0.7310
Epoch 25/30
100/100 [==============================] - 27s 272ms/step - loss: 0.1093 - acc: 0.9635 - val_loss: 0.5781 - val_acc: 0.7330
Epoch 26/30
100/100 [==============================] - 27s 266ms/step - loss: 0.0880 - acc: 0.9725 - val_loss: 0.2866 - val_acc: 0.7420
Epoch 27/30
100/100 [==============================] - 27s 265ms/step - loss: 0.0745 - acc: 0.9765 - val_loss: 1.2640 - val_acc: 0.7300
Epoch 28/30
100/100 [==============================] - 27s 269ms/step - loss: 0.0682 - acc: 0.9795 - val_loss: 0.7920 - val_acc: 0.7260
Epoch 29/30
100/100 [==============================] - 26s 265ms/step - loss: 0.0579 - acc: 0.9855 - val_loss: 0.4166 - val_acc: 0.7270
Epoch 30/30
100/100 [==============================] - 27s 266ms/step - loss: 0.0445 - acc: 0.9860 - val_loss: 0.8489 - val_acc: 0.7380
In [11]:
model.save('cats_and_dogs_small_1.h5')

4. Model Evaluation

As before, we can evaluate the model on the validation sets.

In [12]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
Out[12]:
<matplotlib.legend.Legend at 0x7ff194527d10>

The network starts overfit after ~15 epochs.

5. Data Augmentation

We apply various random transformations to the images to create new images to use for training:

  • rotation of image
  • translate images (both horizontally and vertically)
  • shear image
  • zoom in or out
  • horizontal flips can be performed if symmetry is assumed

We will need to specify how an empty regions created by such transformations are filled. One possibility is to fill it with values of the nearest pixel.

In [13]:
datagen = ImageDataGenerator(rotation_range=40, width_shift_range=0.2, height_shift_range=0.2,
                            shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')

We can have a look at the augmented images look like.

In [14]:
from keras.preprocessing import image

base_dir = '/home/d869321/Data/CatsVsDogsSmall'
train_dir = os.path.join(base_dir, 'train')
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

fnames = [os.path.join(train_cats_dir, fname) for fname in os.listdir(train_cats_dir)]

img_path = fnames[5]  # choose one image to augment
img = image.load_img(img_path, target_size=(150, 150))
x = image.img_to_array(img)  # convert to numpy array
x = x.reshape((1,) + x.shape)  # reshape to (1, 150, 150, 3)

i = 0
for batch in datagen.flow(x, batch_size=1):
    plt.figure(i)
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    i += 1
    if i % 4 == 0:
        break
        

We can then fit a neural network using the augmented data. We start by defining a CNN with dropout in the final convolutional layer.

In [15]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-4), metrics=['acc'])

Let us now create the augmented dataset to use for training. We shouldn't augment the test dataset.

In [16]:
train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=40,
                                  width_shift_range=0.2, height_shift_range=0.2,
                                  shear_range=0.2, zoom_range=0.2, horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(train_dir, target_size=(150, 150),
                                                   batch_size=32, class_mode='binary')

validation_generator = test_datagen.flow_from_directory(validation_dir, target_size=(150, 150),
                                                       batch_size=32, class_mode='binary')
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.

We can then fit the model on the augmented data and subsequently validate it on the data in the validation directory.

In [17]:
history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=100,
                             validation_data=validation_generator, validation_steps=50)
Epoch 1/100
100/100 [==============================] - 42s 420ms/step - loss: 0.6943 - acc: 0.5060 - val_loss: 0.6950 - val_acc: 0.5641
Epoch 2/100
100/100 [==============================] - 42s 419ms/step - loss: 0.6807 - acc: 0.5590 - val_loss: 0.6460 - val_acc: 0.5979
Epoch 3/100
100/100 [==============================] - 42s 424ms/step - loss: 0.6656 - acc: 0.5927 - val_loss: 0.6775 - val_acc: 0.6129
Epoch 4/100
100/100 [==============================] - 42s 423ms/step - loss: 0.6559 - acc: 0.6082 - val_loss: 0.6509 - val_acc: 0.6488
Epoch 5/100
100/100 [==============================] - 42s 424ms/step - loss: 0.6387 - acc: 0.6316 - val_loss: 0.7874 - val_acc: 0.5920
Epoch 6/100
100/100 [==============================] - 43s 429ms/step - loss: 0.6239 - acc: 0.6486 - val_loss: 0.5628 - val_acc: 0.6733
Epoch 7/100
100/100 [==============================] - 43s 426ms/step - loss: 0.6155 - acc: 0.6638 - val_loss: 0.5185 - val_acc: 0.6942
Epoch 8/100
100/100 [==============================] - 43s 434ms/step - loss: 0.5969 - acc: 0.6825 - val_loss: 0.6267 - val_acc: 0.6836
Epoch 9/100
100/100 [==============================] - 43s 429ms/step - loss: 0.5946 - acc: 0.6799 - val_loss: 0.6309 - val_acc: 0.7023
Epoch 10/100
100/100 [==============================] - 43s 427ms/step - loss: 0.5884 - acc: 0.6828 - val_loss: 0.9271 - val_acc: 0.6091
Epoch 11/100
100/100 [==============================] - 43s 427ms/step - loss: 0.5763 - acc: 0.6932 - val_loss: 0.6481 - val_acc: 0.6933
Epoch 12/100
100/100 [==============================] - 44s 438ms/step - loss: 0.5732 - acc: 0.7045 - val_loss: 0.4883 - val_acc: 0.7081
Epoch 13/100
100/100 [==============================] - 43s 430ms/step - loss: 0.5639 - acc: 0.7080 - val_loss: 0.6193 - val_acc: 0.6811
Epoch 14/100
100/100 [==============================] - 43s 428ms/step - loss: 0.5661 - acc: 0.7036 - val_loss: 0.6899 - val_acc: 0.6973
Epoch 15/100
100/100 [==============================] - 44s 436ms/step - loss: 0.5551 - acc: 0.7079 - val_loss: 0.5823 - val_acc: 0.7081
Epoch 16/100
100/100 [==============================] - 43s 427ms/step - loss: 0.5559 - acc: 0.7222 - val_loss: 0.8956 - val_acc: 0.7229
Epoch 17/100
100/100 [==============================] - 43s 427ms/step - loss: 0.5561 - acc: 0.7073 - val_loss: 0.5746 - val_acc: 0.7164
Epoch 18/100
100/100 [==============================] - 44s 436ms/step - loss: 0.5527 - acc: 0.7151 - val_loss: 0.3925 - val_acc: 0.7616
Epoch 19/100
100/100 [==============================] - 43s 432ms/step - loss: 0.5332 - acc: 0.7260 - val_loss: 0.5281 - val_acc: 0.7157
Epoch 20/100
100/100 [==============================] - 43s 433ms/step - loss: 0.5368 - acc: 0.7230 - val_loss: 0.5061 - val_acc: 0.7210
Epoch 21/100
100/100 [==============================] - 43s 429ms/step - loss: 0.5337 - acc: 0.7338 - val_loss: 0.3911 - val_acc: 0.7348
Epoch 22/100
100/100 [==============================] - 43s 430ms/step - loss: 0.5155 - acc: 0.7462 - val_loss: 0.5887 - val_acc: 0.7030
Epoch 23/100
100/100 [==============================] - 44s 439ms/step - loss: 0.5152 - acc: 0.7447 - val_loss: 0.4393 - val_acc: 0.7024
Epoch 24/100
100/100 [==============================] - 44s 438ms/step - loss: 0.5191 - acc: 0.7456 - val_loss: 0.4800 - val_acc: 0.7216
Epoch 25/100
100/100 [==============================] - 44s 436ms/step - loss: 0.5136 - acc: 0.7447 - val_loss: 0.3121 - val_acc: 0.7539
Epoch 26/100
100/100 [==============================] - 43s 432ms/step - loss: 0.5058 - acc: 0.7462 - val_loss: 0.5964 - val_acc: 0.7481
Epoch 27/100
100/100 [==============================] - 43s 427ms/step - loss: 0.5080 - acc: 0.7494 - val_loss: 0.6246 - val_acc: 0.7674
Epoch 28/100
100/100 [==============================] - 43s 432ms/step - loss: 0.5020 - acc: 0.7544 - val_loss: 0.3748 - val_acc: 0.7265
Epoch 29/100
100/100 [==============================] - 43s 429ms/step - loss: 0.5056 - acc: 0.7412 - val_loss: 0.5157 - val_acc: 0.7822
Epoch 30/100
100/100 [==============================] - 43s 427ms/step - loss: 0.4984 - acc: 0.7516 - val_loss: 0.6024 - val_acc: 0.7379
Epoch 31/100
100/100 [==============================] - 43s 427ms/step - loss: 0.5030 - acc: 0.7535 - val_loss: 0.5976 - val_acc: 0.7004
Epoch 32/100
100/100 [==============================] - 43s 429ms/step - loss: 0.4847 - acc: 0.7616 - val_loss: 0.4300 - val_acc: 0.7758
Epoch 33/100
100/100 [==============================] - 43s 430ms/step - loss: 0.4925 - acc: 0.7597 - val_loss: 0.4540 - val_acc: 0.7608
Epoch 34/100
100/100 [==============================] - 42s 424ms/step - loss: 0.4709 - acc: 0.7727 - val_loss: 0.4807 - val_acc: 0.7687
Epoch 35/100
100/100 [==============================] - 45s 449ms/step - loss: 0.4785 - acc: 0.7720 - val_loss: 0.3276 - val_acc: 0.7595
Epoch 36/100
100/100 [==============================] - 43s 427ms/step - loss: 0.4839 - acc: 0.7636 - val_loss: 0.7129 - val_acc: 0.7513
Epoch 37/100
100/100 [==============================] - 43s 426ms/step - loss: 0.4661 - acc: 0.7806 - val_loss: 0.4817 - val_acc: 0.7424
Epoch 38/100
100/100 [==============================] - 43s 429ms/step - loss: 0.4743 - acc: 0.7729 - val_loss: 0.4850 - val_acc: 0.7893
Epoch 39/100
100/100 [==============================] - 43s 425ms/step - loss: 0.4655 - acc: 0.7730 - val_loss: 0.4851 - val_acc: 0.7633
Epoch 40/100
100/100 [==============================] - 43s 429ms/step - loss: 0.4772 - acc: 0.7762 - val_loss: 0.4266 - val_acc: 0.7700
Epoch 41/100
100/100 [==============================] - 43s 430ms/step - loss: 0.4666 - acc: 0.7723 - val_loss: 0.5695 - val_acc: 0.7526
Epoch 42/100
100/100 [==============================] - 43s 428ms/step - loss: 0.4588 - acc: 0.7822 - val_loss: 0.3659 - val_acc: 0.7608
Epoch 43/100
100/100 [==============================] - 43s 427ms/step - loss: 0.4635 - acc: 0.7828 - val_loss: 0.5717 - val_acc: 0.7455
Epoch 44/100
100/100 [==============================] - 43s 428ms/step - loss: 0.4717 - acc: 0.7701 - val_loss: 0.5731 - val_acc: 0.7792
Epoch 45/100
100/100 [==============================] - 43s 427ms/step - loss: 0.4508 - acc: 0.7923 - val_loss: 0.4648 - val_acc: 0.7249
Epoch 46/100
100/100 [==============================] - 43s 430ms/step - loss: 0.4561 - acc: 0.7842 - val_loss: 0.5439 - val_acc: 0.7957
Epoch 47/100
100/100 [==============================] - 43s 429ms/step - loss: 0.4475 - acc: 0.7861 - val_loss: 0.5538 - val_acc: 0.7616
Epoch 48/100
100/100 [==============================] - 43s 425ms/step - loss: 0.4438 - acc: 0.7865 - val_loss: 0.2955 - val_acc: 0.7745
Epoch 49/100
100/100 [==============================] - 43s 430ms/step - loss: 0.4465 - acc: 0.7880 - val_loss: 0.4793 - val_acc: 0.7912
Epoch 50/100
100/100 [==============================] - 43s 428ms/step - loss: 0.4576 - acc: 0.7736 - val_loss: 0.5976 - val_acc: 0.7970
Epoch 51/100
100/100 [==============================] - 42s 423ms/step - loss: 0.4528 - acc: 0.7831 - val_loss: 0.3325 - val_acc: 0.8014
Epoch 52/100
100/100 [==============================] - 43s 430ms/step - loss: 0.4387 - acc: 0.7989 - val_loss: 0.5242 - val_acc: 0.7358
Epoch 53/100
100/100 [==============================] - 43s 428ms/step - loss: 0.4295 - acc: 0.8046 - val_loss: 0.6460 - val_acc: 0.7316
Epoch 54/100
100/100 [==============================] - 43s 430ms/step - loss: 0.4312 - acc: 0.7971 - val_loss: 0.4249 - val_acc: 0.8015
Epoch 55/100
100/100 [==============================] - 43s 426ms/step - loss: 0.4344 - acc: 0.7960 - val_loss: 0.4023 - val_acc: 0.7735
Epoch 56/100
100/100 [==============================] - 43s 427ms/step - loss: 0.4394 - acc: 0.7968 - val_loss: 0.5015 - val_acc: 0.7796
Epoch 57/100
100/100 [==============================] - 43s 429ms/step - loss: 0.4295 - acc: 0.7958 - val_loss: 0.2994 - val_acc: 0.7848
Epoch 58/100
100/100 [==============================] - 43s 429ms/step - loss: 0.4262 - acc: 0.7993 - val_loss: 0.5274 - val_acc: 0.7855
Epoch 59/100
100/100 [==============================] - 43s 429ms/step - loss: 0.4304 - acc: 0.7999 - val_loss: 0.3397 - val_acc: 0.7887
Epoch 60/100
100/100 [==============================] - 43s 427ms/step - loss: 0.4278 - acc: 0.8005 - val_loss: 0.5567 - val_acc: 0.7354
Epoch 61/100
100/100 [==============================] - 43s 435ms/step - loss: 0.4147 - acc: 0.8078 - val_loss: 0.4718 - val_acc: 0.7854
Epoch 62/100
100/100 [==============================] - 44s 440ms/step - loss: 0.4248 - acc: 0.7968 - val_loss: 0.4209 - val_acc: 0.7468
Epoch 63/100
100/100 [==============================] - 43s 427ms/step - loss: 0.4159 - acc: 0.8087 - val_loss: 0.5651 - val_acc: 0.7326
Epoch 64/100
100/100 [==============================] - 43s 426ms/step - loss: 0.4101 - acc: 0.8059 - val_loss: 0.8502 - val_acc: 0.7390
Epoch 65/100
100/100 [==============================] - 43s 428ms/step - loss: 0.4066 - acc: 0.8138 - val_loss: 0.5222 - val_acc: 0.8090
Epoch 66/100
100/100 [==============================] - 43s 428ms/step - loss: 0.4142 - acc: 0.8043 - val_loss: 0.5173 - val_acc: 0.8054
Epoch 67/100
100/100 [==============================] - 43s 429ms/step - loss: 0.4037 - acc: 0.8197 - val_loss: 0.4038 - val_acc: 0.7824
Epoch 68/100
100/100 [==============================] - 42s 425ms/step - loss: 0.3977 - acc: 0.8194 - val_loss: 0.4116 - val_acc: 0.7964
Epoch 69/100
100/100 [==============================] - 43s 432ms/step - loss: 0.4080 - acc: 0.8166 - val_loss: 0.4086 - val_acc: 0.8141
Epoch 70/100
100/100 [==============================] - 43s 426ms/step - loss: 0.4046 - acc: 0.8100 - val_loss: 0.3680 - val_acc: 0.8086
Epoch 71/100
100/100 [==============================] - 43s 430ms/step - loss: 0.3801 - acc: 0.8320 - val_loss: 0.4633 - val_acc: 0.7963
Epoch 72/100
100/100 [==============================] - 43s 428ms/step - loss: 0.3940 - acc: 0.8106 - val_loss: 0.5529 - val_acc: 0.7680
Epoch 73/100
100/100 [==============================] - 43s 426ms/step - loss: 0.3974 - acc: 0.8185 - val_loss: 0.3891 - val_acc: 0.8119
Epoch 74/100
100/100 [==============================] - 43s 430ms/step - loss: 0.3912 - acc: 0.8169 - val_loss: 0.6402 - val_acc: 0.7525
Epoch 75/100
100/100 [==============================] - 43s 427ms/step - loss: 0.3895 - acc: 0.8242 - val_loss: 0.3577 - val_acc: 0.7841
Epoch 76/100
100/100 [==============================] - 43s 431ms/step - loss: 0.3764 - acc: 0.8301 - val_loss: 0.4190 - val_acc: 0.8173
Epoch 77/100
100/100 [==============================] - 43s 428ms/step - loss: 0.3862 - acc: 0.8194 - val_loss: 0.3709 - val_acc: 0.8170
Epoch 78/100
100/100 [==============================] - 43s 427ms/step - loss: 0.3862 - acc: 0.8242 - val_loss: 0.3234 - val_acc: 0.7944
Epoch 79/100
100/100 [==============================] - 43s 430ms/step - loss: 0.3858 - acc: 0.8257 - val_loss: 0.2529 - val_acc: 0.7919
Epoch 80/100
100/100 [==============================] - 42s 424ms/step - loss: 0.3755 - acc: 0.8384 - val_loss: 1.0740 - val_acc: 0.8125
Epoch 81/100
100/100 [==============================] - 43s 429ms/step - loss: 0.3760 - acc: 0.8239 - val_loss: 0.3690 - val_acc: 0.8338
Epoch 82/100
100/100 [==============================] - 43s 430ms/step - loss: 0.3878 - acc: 0.8326 - val_loss: 0.3942 - val_acc: 0.7925
Epoch 83/100
100/100 [==============================] - 43s 427ms/step - loss: 0.3652 - acc: 0.8396 - val_loss: 0.3333 - val_acc: 0.8027
Epoch 84/100
100/100 [==============================] - 43s 428ms/step - loss: 0.3816 - acc: 0.8269 - val_loss: 0.2550 - val_acc: 0.7790
Epoch 85/100
100/100 [==============================] - 43s 426ms/step - loss: 0.3687 - acc: 0.8302 - val_loss: 0.4182 - val_acc: 0.8192
Epoch 86/100
100/100 [==============================] - 43s 428ms/step - loss: 0.3625 - acc: 0.8425 - val_loss: 0.3841 - val_acc: 0.8331
Epoch 87/100
100/100 [==============================] - 43s 428ms/step - loss: 0.3630 - acc: 0.8335 - val_loss: 0.4267 - val_acc: 0.7621
Epoch 88/100
100/100 [==============================] - 43s 428ms/step - loss: 0.3628 - acc: 0.8417 - val_loss: 0.3869 - val_acc: 0.8138
Epoch 89/100
100/100 [==============================] - 43s 425ms/step - loss: 0.3595 - acc: 0.8385 - val_loss: 0.4164 - val_acc: 0.7957
Epoch 90/100
100/100 [==============================] - 43s 430ms/step - loss: 0.3545 - acc: 0.8378 - val_loss: 0.4324 - val_acc: 0.8096
Epoch 91/100
100/100 [==============================] - 43s 427ms/step - loss: 0.3540 - acc: 0.8425 - val_loss: 0.4524 - val_acc: 0.8131
Epoch 92/100
100/100 [==============================] - 43s 427ms/step - loss: 0.3703 - acc: 0.8308 - val_loss: 0.5032 - val_acc: 0.8192
Epoch 93/100
100/100 [==============================] - 43s 428ms/step - loss: 0.3527 - acc: 0.8395 - val_loss: 0.5214 - val_acc: 0.8131
Epoch 94/100
100/100 [==============================] - 43s 427ms/step - loss: 0.3488 - acc: 0.8415 - val_loss: 0.5036 - val_acc: 0.8115
Epoch 95/100
100/100 [==============================] - 43s 429ms/step - loss: 0.3536 - acc: 0.8464 - val_loss: 0.4676 - val_acc: 0.8235
Epoch 96/100
100/100 [==============================] - 42s 425ms/step - loss: 0.3391 - acc: 0.8512 - val_loss: 0.2139 - val_acc: 0.8215
Epoch 97/100
100/100 [==============================] - 43s 426ms/step - loss: 0.3388 - acc: 0.8508 - val_loss: 0.3948 - val_acc: 0.8185
Epoch 98/100
100/100 [==============================] - 43s 430ms/step - loss: 0.3412 - acc: 0.8455 - val_loss: 0.4115 - val_acc: 0.7957
Epoch 99/100
100/100 [==============================] - 43s 427ms/step - loss: 0.3247 - acc: 0.8580 - val_loss: 1.2644 - val_acc: 0.8179
Epoch 100/100
100/100 [==============================] - 43s 426ms/step - loss: 0.3544 - acc: 0.8327 - val_loss: 0.4067 - val_acc: 0.7964
In [18]:
model.save('cats_and_dogs_small_2.h5')

We can again replot the loss and accuracy over both the training and validation datasets.

In [19]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
Out[19]:
<matplotlib.legend.Legend at 0x7ff17407cb90>

Thanks to data augmentation and dropout, we'e boosted the model accuracy to ~80%.

6. Leveraging a Pretrained Model

It is often helpful to use a pretrained network that was trained on a large scale image-classifcation task. If the dataset is large and general enough, then the spatial hierarchy of features learned by the pretrained network can be useful in the visual world. There are two ways to use the features.

6.1. Feature Extraction

Every convolutional neural network consists of a convolutional base and a classifier. We can choose to freeze the convolutional base and retrain the classifier part on a new image processing task. We shall use the VGG16 model.

In [20]:
from keras.applications import VGG16

# 1st argument is the weight checkpoint from which to initialize the model
# 2nd argument is whether or not to include the densely connecected classifer
# 3rd argument is input shape of image tensors. If not passed, network will be able to process any input
conv_base = VGG16(weights='imagenet', include_top=False, input_shape=(150, 150, 3))

We need to know the input to feed into the classifier so let's have a look at the neural network shape.

In [21]:
conv_base.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

Note the final convolutional layer has shape (4, 4, 512). We can proceed in one of two ways:

  • run the convolutional base over the 2000 images to produce the (4, 4, 512) output for each image. This can then be used as input for a standalone, densely connected classifer. This solution is computationally cheap, but it does not allow us to use data augmentation.
  • extend the convolational base model by adding dense layers on top and run the whole thing end to end on the input data. This will allow us to use data augmentation but will be very expensive.

6.1.1. Fast Feature Extraction without Data Augmentation

We start by running instances of ImageDataGenerator to extract images as Numpy arrays and their corresponding labels. We can subsequently extract features from these images by calling the predict method of the conv_base model.

In [22]:
datagen = ImageDataGenerator(rescale=1./255)
batch_size = 20

def extract_features(directory, sample_count):
    features = np.zeros(shape=(sample_count, 4, 4, 512))  # dimension of final convolutional layer
    labels = np.zeros(shape=(sample_count))
    generator = datagen.flow_from_directory(directory, target_size=(150, 150), 
                                            batch_size=batch_size, class_mode='binary')
    
    i = 0
    for inputs_batch, labels_batch in generator:
        features_batch = conv_base.predict(inputs_batch)  # convolutional base output for the i-th batch
        features[(i * batch_size) : ((i + 1) * batch_size)] = features_batch
        labels[(i * batch_size) : ((i + 1) * batch_size)] = labels_batch
        i += 1
        if (i * batch_size) >= sample_count:
            break
    
    return features, labels

test_dir = os.path.join(base_dir, 'test')

train_features, train_labels = extract_features(train_dir, 2000)
validation_features, validation_labels = extract_features(validation_dir, 1000)
test_features, test_labels = extract_features(test_dir, 1000)

# need to flatten features from (samples, 4, 4, 512) to (samples, 8192)
train_features = np.reshape(train_features, (2000, 4 * 4 * 512))
validation_features = np.reshape(validation_features, (1000, 4 * 4 * 512))
test_features = np.reshape(test_features, (1000, 4 * 4 * 512))
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.

We can now define a densely-connected classifier to train the data on. We shall use dropout on the first layer.

In [23]:
model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_dim=(4 * 4 * 512)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer=optimizers.RMSprop(lr=2e-5), loss='binary_crossentropy', metrics=['acc'])

history = model.fit(train_features, train_labels, epochs=30, batch_size=20, 
                   validation_data=(validation_features, validation_labels))
Train on 2000 samples, validate on 1000 samples
Epoch 1/30
2000/2000 [==============================] - 1s 677us/step - loss: 0.5809 - acc: 0.6895 - val_loss: 0.4235 - val_acc: 0.8480
Epoch 2/30
2000/2000 [==============================] - 1s 641us/step - loss: 0.4132 - acc: 0.8195 - val_loss: 0.3509 - val_acc: 0.8690
Epoch 3/30
2000/2000 [==============================] - 1s 645us/step - loss: 0.3473 - acc: 0.8565 - val_loss: 0.3360 - val_acc: 0.8590
Epoch 4/30
2000/2000 [==============================] - 1s 625us/step - loss: 0.3064 - acc: 0.8725 - val_loss: 0.2945 - val_acc: 0.8900
Epoch 5/30
2000/2000 [==============================] - 1s 622us/step - loss: 0.2833 - acc: 0.8900 - val_loss: 0.2807 - val_acc: 0.8950
Epoch 6/30
2000/2000 [==============================] - 1s 630us/step - loss: 0.2588 - acc: 0.8955 - val_loss: 0.2810 - val_acc: 0.8820
Epoch 7/30
2000/2000 [==============================] - 1s 630us/step - loss: 0.2503 - acc: 0.8990 - val_loss: 0.2619 - val_acc: 0.8970
Epoch 8/30
2000/2000 [==============================] - 1s 644us/step - loss: 0.2261 - acc: 0.9175 - val_loss: 0.2622 - val_acc: 0.8870
Epoch 9/30
2000/2000 [==============================] - 1s 621us/step - loss: 0.2133 - acc: 0.9215 - val_loss: 0.2544 - val_acc: 0.8960
Epoch 10/30
2000/2000 [==============================] - 1s 641us/step - loss: 0.2068 - acc: 0.9265 - val_loss: 0.2481 - val_acc: 0.8990
Epoch 11/30
2000/2000 [==============================] - 1s 659us/step - loss: 0.1922 - acc: 0.9255 - val_loss: 0.2589 - val_acc: 0.8910
Epoch 12/30
2000/2000 [==============================] - 1s 632us/step - loss: 0.1828 - acc: 0.9370 - val_loss: 0.2549 - val_acc: 0.8940
Epoch 13/30
2000/2000 [==============================] - 1s 641us/step - loss: 0.1783 - acc: 0.9320 - val_loss: 0.2408 - val_acc: 0.9020
Epoch 14/30
2000/2000 [==============================] - 1s 679us/step - loss: 0.1796 - acc: 0.9370 - val_loss: 0.2396 - val_acc: 0.9000
Epoch 15/30
2000/2000 [==============================] - 1s 627us/step - loss: 0.1615 - acc: 0.9425 - val_loss: 0.2374 - val_acc: 0.8980
Epoch 16/30
2000/2000 [==============================] - 1s 631us/step - loss: 0.1495 - acc: 0.9475 - val_loss: 0.2438 - val_acc: 0.9000
Epoch 17/30
2000/2000 [==============================] - 1s 621us/step - loss: 0.1439 - acc: 0.9520 - val_loss: 0.2412 - val_acc: 0.9030
Epoch 18/30
2000/2000 [==============================] - 1s 628us/step - loss: 0.1425 - acc: 0.9495 - val_loss: 0.2350 - val_acc: 0.9000
Epoch 19/30
2000/2000 [==============================] - 1s 628us/step - loss: 0.1374 - acc: 0.9540 - val_loss: 0.2392 - val_acc: 0.9040
Epoch 20/30
2000/2000 [==============================] - 1s 627us/step - loss: 0.1331 - acc: 0.9530 - val_loss: 0.2392 - val_acc: 0.9030
Epoch 21/30
2000/2000 [==============================] - 1s 627us/step - loss: 0.1270 - acc: 0.9580 - val_loss: 0.2371 - val_acc: 0.9040
Epoch 22/30
2000/2000 [==============================] - 1s 630us/step - loss: 0.1230 - acc: 0.9580 - val_loss: 0.2364 - val_acc: 0.9030
Epoch 23/30
2000/2000 [==============================] - 1s 630us/step - loss: 0.1180 - acc: 0.9615 - val_loss: 0.2373 - val_acc: 0.9030
Epoch 24/30
2000/2000 [==============================] - 1s 633us/step - loss: 0.1150 - acc: 0.9615 - val_loss: 0.2357 - val_acc: 0.9030
Epoch 25/30
2000/2000 [==============================] - 1s 662us/step - loss: 0.1090 - acc: 0.9675 - val_loss: 0.2429 - val_acc: 0.9050
Epoch 26/30
2000/2000 [==============================] - 1s 660us/step - loss: 0.1038 - acc: 0.9690 - val_loss: 0.2367 - val_acc: 0.9010
Epoch 27/30
2000/2000 [==============================] - 1s 626us/step - loss: 0.1042 - acc: 0.9635 - val_loss: 0.2469 - val_acc: 0.9050
Epoch 28/30
2000/2000 [==============================] - 1s 625us/step - loss: 0.0962 - acc: 0.9715 - val_loss: 0.2466 - val_acc: 0.9050
Epoch 29/30
2000/2000 [==============================] - 1s 642us/step - loss: 0.0934 - acc: 0.9675 - val_loss: 0.2398 - val_acc: 0.9030
Epoch 30/30
2000/2000 [==============================] - 1s 635us/step - loss: 0.0893 - acc: 0.9730 - val_loss: 0.2373 - val_acc: 0.9040

Lets plot the performance of the model.

In [24]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
Out[24]:
<matplotlib.legend.Legend at 0x7ff1946669d0>

We reach an accuracy of ~90% on the validated dataset. We can check if we can improve this still further by allowing for data augmentation.

6.1.2. Feature Extraction with Data Augmentation

Here, we simply extend the convolutional base model with a classifier.

In [25]:
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

The model architecture is as follows:

In [26]:
model.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Model)                (None, 4, 4, 512)         14714688  
_________________________________________________________________
flatten_3 (Flatten)          (None, 8192)              0         
_________________________________________________________________
dense_7 (Dense)              (None, 256)               2097408   
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 257       
=================================================================
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0
_________________________________________________________________

We need to freeze the convolutional base to ensure that the representation does not get destroyed whilst updating the randomly-initialized weights of the classifier.

In [27]:
print('This is the number of trainable weights before freezing the conv base', len(model.trainable_weights))

conv_base.trainable = False
print('This is the number of trainable weights after freezing the conv base', len(model.trainable_weights))
This is the number of trainable weights before freezing the conv base 30
This is the number of trainable weights after freezing the conv base 4

The 4 trainable weights refers to the main weight matrix and the bias vectors for each of the two layers.

In order for these changes to take effect, we compile the model.

In [28]:
train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=40, 
                                   width_shift_range=0.2, height_shift_range=0.2,
                                   shear_range=0.2, zoom_range=0.2, 
                                   horizontal_flip=True, fill_mode='nearest')

# the test dataset should NOT be augmented!!!
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(train_dir, target_size=(150, 150), 
                                                    batch_size=20, class_mode='binary')

validation_generator = test_datagen.flow_from_directory(validation_dir, target_size=(150, 150),
                                                       batch_size=20, class_mode='binary')

model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=2e-5), metrics=['acc'])

history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=30, 
                              validation_data=validation_generator, validation_steps=50)
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Epoch 1/30
100/100 [==============================] - 158s 2s/step - loss: 0.6100 - acc: 0.6645 - val_loss: 0.5518 - val_acc: 0.8230
Epoch 2/30
100/100 [==============================] - 154s 2s/step - loss: 0.4836 - acc: 0.7910 - val_loss: 0.4166 - val_acc: 0.8410
Epoch 3/30
100/100 [==============================] - 155s 2s/step - loss: 0.4469 - acc: 0.7995 - val_loss: 0.1546 - val_acc: 0.8680
Epoch 4/30
100/100 [==============================] - 154s 2s/step - loss: 0.4084 - acc: 0.8170 - val_loss: 0.1767 - val_acc: 0.8770
Epoch 5/30
100/100 [==============================] - 154s 2s/step - loss: 0.3855 - acc: 0.8255 - val_loss: 0.3572 - val_acc: 0.8800
Epoch 6/30
100/100 [==============================] - 154s 2s/step - loss: 0.3777 - acc: 0.8250 - val_loss: 0.4798 - val_acc: 0.8840
Epoch 7/30
100/100 [==============================] - 154s 2s/step - loss: 0.3532 - acc: 0.8525 - val_loss: 0.2043 - val_acc: 0.8860
Epoch 8/30
100/100 [==============================] - 154s 2s/step - loss: 0.3533 - acc: 0.8475 - val_loss: 0.1975 - val_acc: 0.8860
Epoch 9/30
100/100 [==============================] - 155s 2s/step - loss: 0.3448 - acc: 0.8480 - val_loss: 0.1521 - val_acc: 0.8810
Epoch 10/30
100/100 [==============================] - 154s 2s/step - loss: 0.3335 - acc: 0.8515 - val_loss: 0.3952 - val_acc: 0.8880
Epoch 11/30
100/100 [==============================] - 155s 2s/step - loss: 0.3178 - acc: 0.8635 - val_loss: 0.1743 - val_acc: 0.8840
Epoch 12/30
100/100 [==============================] - 154s 2s/step - loss: 0.3294 - acc: 0.8510 - val_loss: 0.3480 - val_acc: 0.8890
Epoch 13/30
100/100 [==============================] - 154s 2s/step - loss: 0.3296 - acc: 0.8535 - val_loss: 0.2850 - val_acc: 0.8930
Epoch 14/30
100/100 [==============================] - 154s 2s/step - loss: 0.3189 - acc: 0.8570 - val_loss: 0.2812 - val_acc: 0.8870
Epoch 15/30
100/100 [==============================] - 154s 2s/step - loss: 0.3148 - acc: 0.8590 - val_loss: 0.1292 - val_acc: 0.8930
Epoch 16/30
100/100 [==============================] - 154s 2s/step - loss: 0.3208 - acc: 0.8650 - val_loss: 0.4054 - val_acc: 0.8970
Epoch 17/30
100/100 [==============================] - 154s 2s/step - loss: 0.3006 - acc: 0.8720 - val_loss: 0.2368 - val_acc: 0.8970
Epoch 18/30
100/100 [==============================] - 154s 2s/step - loss: 0.3095 - acc: 0.8625 - val_loss: 0.1441 - val_acc: 0.8930
Epoch 19/30
100/100 [==============================] - 154s 2s/step - loss: 0.3137 - acc: 0.8665 - val_loss: 0.0950 - val_acc: 0.8910
Epoch 20/30
100/100 [==============================] - 154s 2s/step - loss: 0.3064 - acc: 0.8685 - val_loss: 0.1767 - val_acc: 0.8960
Epoch 21/30
100/100 [==============================] - 154s 2s/step - loss: 0.2947 - acc: 0.8735 - val_loss: 0.1296 - val_acc: 0.8850
Epoch 22/30
100/100 [==============================] - 154s 2s/step - loss: 0.2779 - acc: 0.8845 - val_loss: 0.1780 - val_acc: 0.8900
Epoch 23/30
100/100 [==============================] - 154s 2s/step - loss: 0.2828 - acc: 0.8885 - val_loss: 0.1805 - val_acc: 0.9050
Epoch 24/30
100/100 [==============================] - 156s 2s/step - loss: 0.2936 - acc: 0.8700 - val_loss: 0.1159 - val_acc: 0.9010
Epoch 25/30
100/100 [==============================] - 154s 2s/step - loss: 0.2930 - acc: 0.8790 - val_loss: 0.0727 - val_acc: 0.8950
Epoch 26/30
100/100 [==============================] - 154s 2s/step - loss: 0.2884 - acc: 0.8760 - val_loss: 0.1816 - val_acc: 0.8930
Epoch 27/30
100/100 [==============================] - 154s 2s/step - loss: 0.2929 - acc: 0.8690 - val_loss: 0.3374 - val_acc: 0.8910
Epoch 28/30
100/100 [==============================] - 154s 2s/step - loss: 0.2821 - acc: 0.8840 - val_loss: 0.2939 - val_acc: 0.9000
Epoch 29/30
100/100 [==============================] - 154s 2s/step - loss: 0.2782 - acc: 0.8750 - val_loss: 0.1719 - val_acc: 0.8970
Epoch 30/30
100/100 [==============================] - 154s 2s/step - loss: 0.2927 - acc: 0.8695 - val_loss: 0.2252 - val_acc: 0.9000

Let's check the performance of the model.

In [39]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
Out[39]:
<matplotlib.legend.Legend at 0x7f7fc4624890>

The accuracy is still ~90%.

6.2. Fine Tuning

The second way to use a pre-trained model was for fine-tuning. This consists of unfreezing a few of the top layers of the convolutional base and jointly training these layers in addition to the the fully-connected classifier on top. This allows slight adjustment to the more abstract representations of the model in order to make them more relevant to the problem at hand. We choose to unfreeze the layers after the penultimate MaxPooling2D layer.

In order to avoid destroying these higher-level abstractions, we start by initially freezing the whole convolutional base and train the classifier part. Once the classifier is trained, we can unfreeze the top layers of the Convolutional base and update it and the classifier.

Here is the current convolutional base:

In [29]:
conv_base.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________

We will make all layers up to block4_pool frozen and layers block5_conv1, block5_conv2, block5_conv3 will be trainable.

In [30]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
    if layer.name == 'block5_conv1':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False

In training the model, we use a very low learning rate as we want to limit the modifications made to the representations.

In [31]:
model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-5), metrics=['acc'])

history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=100,
                             validation_data=validation_generator, validation_steps=50)
Epoch 1/100
100/100 [==============================] - 177s 2s/step - loss: 0.2919 - acc: 0.8700 - val_loss: 0.2097 - val_acc: 0.8980
Epoch 2/100
100/100 [==============================] - 175s 2s/step - loss: 0.2497 - acc: 0.8970 - val_loss: 0.1153 - val_acc: 0.9150
Epoch 3/100
100/100 [==============================] - 175s 2s/step - loss: 0.2464 - acc: 0.8965 - val_loss: 0.0415 - val_acc: 0.9130
Epoch 4/100
100/100 [==============================] - 175s 2s/step - loss: 0.2285 - acc: 0.9020 - val_loss: 0.0196 - val_acc: 0.9220
Epoch 5/100
100/100 [==============================] - 175s 2s/step - loss: 0.2056 - acc: 0.9225 - val_loss: 0.0044 - val_acc: 0.9260
Epoch 6/100
100/100 [==============================] - 176s 2s/step - loss: 0.1909 - acc: 0.9145 - val_loss: 0.2757 - val_acc: 0.9250
Epoch 7/100
100/100 [==============================] - 178s 2s/step - loss: 0.1769 - acc: 0.9290 - val_loss: 0.0189 - val_acc: 0.9260
Epoch 8/100
100/100 [==============================] - 176s 2s/step - loss: 0.1734 - acc: 0.9335 - val_loss: 0.0578 - val_acc: 0.9360
Epoch 9/100
100/100 [==============================] - 176s 2s/step - loss: 0.1629 - acc: 0.9355 - val_loss: 0.2552 - val_acc: 0.9240
Epoch 10/100
100/100 [==============================] - 176s 2s/step - loss: 0.1438 - acc: 0.9450 - val_loss: 0.2256 - val_acc: 0.9190
Epoch 11/100
100/100 [==============================] - 176s 2s/step - loss: 0.1375 - acc: 0.9465 - val_loss: 0.2209 - val_acc: 0.9190
Epoch 12/100
100/100 [==============================] - 176s 2s/step - loss: 0.1327 - acc: 0.9450 - val_loss: 0.2837 - val_acc: 0.9220
Epoch 13/100
100/100 [==============================] - 176s 2s/step - loss: 0.1370 - acc: 0.9470 - val_loss: 0.3102 - val_acc: 0.9310
Epoch 14/100
100/100 [==============================] - 176s 2s/step - loss: 0.1246 - acc: 0.9465 - val_loss: 0.0358 - val_acc: 0.9380
Epoch 15/100
100/100 [==============================] - 176s 2s/step - loss: 0.1195 - acc: 0.9560 - val_loss: 0.0615 - val_acc: 0.9270
Epoch 16/100
100/100 [==============================] - 176s 2s/step - loss: 0.1154 - acc: 0.9545 - val_loss: 0.4056 - val_acc: 0.9290
Epoch 17/100
100/100 [==============================] - 176s 2s/step - loss: 0.1070 - acc: 0.9580 - val_loss: 0.0391 - val_acc: 0.9380
Epoch 18/100
100/100 [==============================] - 176s 2s/step - loss: 0.1016 - acc: 0.9570 - val_loss: 0.0930 - val_acc: 0.9370
Epoch 19/100
100/100 [==============================] - 176s 2s/step - loss: 0.1061 - acc: 0.9560 - val_loss: 0.0866 - val_acc: 0.9380
Epoch 20/100
100/100 [==============================] - 176s 2s/step - loss: 0.0918 - acc: 0.9645 - val_loss: 0.0648 - val_acc: 0.9190
Epoch 21/100
100/100 [==============================] - 176s 2s/step - loss: 0.0919 - acc: 0.9670 - val_loss: 0.0784 - val_acc: 0.9370
Epoch 22/100
100/100 [==============================] - 176s 2s/step - loss: 0.0980 - acc: 0.9625 - val_loss: 0.0728 - val_acc: 0.9190
Epoch 23/100
100/100 [==============================] - 176s 2s/step - loss: 0.0909 - acc: 0.9605 - val_loss: 0.3971 - val_acc: 0.9270
Epoch 24/100
100/100 [==============================] - 176s 2s/step - loss: 0.0821 - acc: 0.9695 - val_loss: 0.0764 - val_acc: 0.9170
Epoch 25/100
100/100 [==============================] - 177s 2s/step - loss: 0.0856 - acc: 0.9710 - val_loss: 0.4969 - val_acc: 0.9320
Epoch 26/100
100/100 [==============================] - 176s 2s/step - loss: 0.0876 - acc: 0.9670 - val_loss: 0.0145 - val_acc: 0.9360
Epoch 27/100
100/100 [==============================] - 176s 2s/step - loss: 0.0817 - acc: 0.9680 - val_loss: 0.0341 - val_acc: 0.9400
Epoch 28/100
100/100 [==============================] - 178s 2s/step - loss: 0.0672 - acc: 0.9780 - val_loss: 0.2337 - val_acc: 0.9370
Epoch 29/100
100/100 [==============================] - 176s 2s/step - loss: 0.0846 - acc: 0.9660 - val_loss: 0.1364 - val_acc: 0.9340
Epoch 30/100
100/100 [==============================] - 176s 2s/step - loss: 0.0547 - acc: 0.9810 - val_loss: 0.0164 - val_acc: 0.9420
Epoch 31/100
100/100 [==============================] - 176s 2s/step - loss: 0.0685 - acc: 0.9780 - val_loss: 0.1767 - val_acc: 0.9400
Epoch 32/100
100/100 [==============================] - 176s 2s/step - loss: 0.0699 - acc: 0.9740 - val_loss: 0.1116 - val_acc: 0.9320
Epoch 33/100
100/100 [==============================] - 176s 2s/step - loss: 0.0456 - acc: 0.9830 - val_loss: 0.1422 - val_acc: 0.9360
Epoch 34/100
100/100 [==============================] - 175s 2s/step - loss: 0.0726 - acc: 0.9735 - val_loss: 0.1354 - val_acc: 0.9340
Epoch 35/100
100/100 [==============================] - 176s 2s/step - loss: 0.0599 - acc: 0.9765 - val_loss: 0.0135 - val_acc: 0.9380
Epoch 36/100
100/100 [==============================] - 176s 2s/step - loss: 0.0527 - acc: 0.9795 - val_loss: 0.4169 - val_acc: 0.9300
Epoch 37/100
100/100 [==============================] - 176s 2s/step - loss: 0.0628 - acc: 0.9800 - val_loss: 0.4465 - val_acc: 0.9160
Epoch 38/100
100/100 [==============================] - 176s 2s/step - loss: 0.0573 - acc: 0.9755 - val_loss: 0.2015 - val_acc: 0.9410
Epoch 39/100
100/100 [==============================] - 176s 2s/step - loss: 0.0558 - acc: 0.9820 - val_loss: 0.0349 - val_acc: 0.9320
Epoch 40/100
100/100 [==============================] - 176s 2s/step - loss: 0.0433 - acc: 0.9820 - val_loss: 0.2263 - val_acc: 0.9350
Epoch 41/100
100/100 [==============================] - 176s 2s/step - loss: 0.0460 - acc: 0.9810 - val_loss: 0.2815 - val_acc: 0.9360
Epoch 42/100
100/100 [==============================] - 176s 2s/step - loss: 0.0464 - acc: 0.9820 - val_loss: 0.3312 - val_acc: 0.9320
Epoch 43/100
100/100 [==============================] - 176s 2s/step - loss: 0.0440 - acc: 0.9850 - val_loss: 0.2639 - val_acc: 0.9330
Epoch 44/100
100/100 [==============================] - 176s 2s/step - loss: 0.0393 - acc: 0.9840 - val_loss: 0.0923 - val_acc: 0.9380
Epoch 45/100
100/100 [==============================] - 177s 2s/step - loss: 0.0412 - acc: 0.9850 - val_loss: 0.2380 - val_acc: 0.9310
Epoch 46/100
100/100 [==============================] - 176s 2s/step - loss: 0.0462 - acc: 0.9820 - val_loss: 0.4015 - val_acc: 0.9360
Epoch 47/100
100/100 [==============================] - 176s 2s/step - loss: 0.0465 - acc: 0.9810 - val_loss: 0.2502 - val_acc: 0.9420
Epoch 48/100
100/100 [==============================] - 178s 2s/step - loss: 0.0429 - acc: 0.9860 - val_loss: 4.4782e-04 - val_acc: 0.9370
Epoch 49/100
100/100 [==============================] - 176s 2s/step - loss: 0.0425 - acc: 0.9850 - val_loss: 0.1513 - val_acc: 0.9330
Epoch 50/100
100/100 [==============================] - 176s 2s/step - loss: 0.0441 - acc: 0.9815 - val_loss: 0.5874 - val_acc: 0.9360
Epoch 51/100
100/100 [==============================] - 176s 2s/step - loss: 0.0314 - acc: 0.9915 - val_loss: 0.2330 - val_acc: 0.9290
Epoch 52/100
100/100 [==============================] - 176s 2s/step - loss: 0.0377 - acc: 0.9870 - val_loss: 0.0057 - val_acc: 0.9060
Epoch 53/100
100/100 [==============================] - 176s 2s/step - loss: 0.0288 - acc: 0.9915 - val_loss: 0.2173 - val_acc: 0.9370
Epoch 54/100
100/100 [==============================] - 176s 2s/step - loss: 0.0230 - acc: 0.9920 - val_loss: 0.5396 - val_acc: 0.9250
Epoch 55/100
100/100 [==============================] - 176s 2s/step - loss: 0.0356 - acc: 0.9845 - val_loss: 0.1316 - val_acc: 0.9430
Epoch 56/100
100/100 [==============================] - 176s 2s/step - loss: 0.0347 - acc: 0.9865 - val_loss: 0.0760 - val_acc: 0.9390
Epoch 57/100
100/100 [==============================] - 176s 2s/step - loss: 0.0295 - acc: 0.9905 - val_loss: 0.1483 - val_acc: 0.9420
Epoch 58/100
100/100 [==============================] - 176s 2s/step - loss: 0.0405 - acc: 0.9880 - val_loss: 0.1526 - val_acc: 0.9470
Epoch 59/100
100/100 [==============================] - 176s 2s/step - loss: 0.0411 - acc: 0.9840 - val_loss: 0.0824 - val_acc: 0.9470
Epoch 60/100
100/100 [==============================] - 176s 2s/step - loss: 0.0370 - acc: 0.9810 - val_loss: 0.4691 - val_acc: 0.9490
Epoch 61/100
100/100 [==============================] - 176s 2s/step - loss: 0.0267 - acc: 0.9910 - val_loss: 0.5728 - val_acc: 0.9450
Epoch 62/100
100/100 [==============================] - 176s 2s/step - loss: 0.0284 - acc: 0.9885 - val_loss: 0.0448 - val_acc: 0.9400
Epoch 63/100
100/100 [==============================] - 176s 2s/step - loss: 0.0259 - acc: 0.9900 - val_loss: 0.0022 - val_acc: 0.9390
Epoch 64/100
100/100 [==============================] - 176s 2s/step - loss: 0.0316 - acc: 0.9905 - val_loss: 0.1989 - val_acc: 0.9400
Epoch 65/100
100/100 [==============================] - 176s 2s/step - loss: 0.0268 - acc: 0.9910 - val_loss: 0.0010 - val_acc: 0.9370
Epoch 66/100
100/100 [==============================] - 176s 2s/step - loss: 0.0262 - acc: 0.9880 - val_loss: 0.7435 - val_acc: 0.9370
Epoch 67/100
100/100 [==============================] - 176s 2s/step - loss: 0.0496 - acc: 0.9860 - val_loss: 0.1559 - val_acc: 0.9450
Epoch 68/100
100/100 [==============================] - 176s 2s/step - loss: 0.0278 - acc: 0.9895 - val_loss: 0.0035 - val_acc: 0.9440
Epoch 69/100
100/100 [==============================] - 178s 2s/step - loss: 0.0250 - acc: 0.9920 - val_loss: 0.3468 - val_acc: 0.9390
Epoch 70/100
100/100 [==============================] - 176s 2s/step - loss: 0.0300 - acc: 0.9895 - val_loss: 0.4437 - val_acc: 0.9380
Epoch 71/100
100/100 [==============================] - 176s 2s/step - loss: 0.0178 - acc: 0.9935 - val_loss: 0.1027 - val_acc: 0.9430
Epoch 72/100
100/100 [==============================] - 176s 2s/step - loss: 0.0252 - acc: 0.9910 - val_loss: 0.0863 - val_acc: 0.9400
Epoch 73/100
100/100 [==============================] - 176s 2s/step - loss: 0.0242 - acc: 0.9925 - val_loss: 0.0360 - val_acc: 0.9380
Epoch 74/100
100/100 [==============================] - 176s 2s/step - loss: 0.0298 - acc: 0.9890 - val_loss: 0.0438 - val_acc: 0.9410
Epoch 75/100
100/100 [==============================] - 176s 2s/step - loss: 0.0223 - acc: 0.9920 - val_loss: 0.0321 - val_acc: 0.9390
Epoch 76/100
100/100 [==============================] - 176s 2s/step - loss: 0.0311 - acc: 0.9920 - val_loss: 0.0034 - val_acc: 0.9510
Epoch 77/100
100/100 [==============================] - 176s 2s/step - loss: 0.0237 - acc: 0.9915 - val_loss: 1.6103 - val_acc: 0.9300
Epoch 78/100
100/100 [==============================] - 176s 2s/step - loss: 0.0269 - acc: 0.9910 - val_loss: 0.3459 - val_acc: 0.9390
Epoch 79/100
100/100 [==============================] - 176s 2s/step - loss: 0.0239 - acc: 0.9915 - val_loss: 0.3264 - val_acc: 0.9160
Epoch 80/100
100/100 [==============================] - 176s 2s/step - loss: 0.0189 - acc: 0.9955 - val_loss: 0.0046 - val_acc: 0.9370
Epoch 81/100
100/100 [==============================] - 176s 2s/step - loss: 0.0257 - acc: 0.9915 - val_loss: 0.0033 - val_acc: 0.9460
Epoch 82/100
100/100 [==============================] - 176s 2s/step - loss: 0.0196 - acc: 0.9925 - val_loss: 0.3098 - val_acc: 0.9220
Epoch 83/100
100/100 [==============================] - 176s 2s/step - loss: 0.0197 - acc: 0.9925 - val_loss: 0.5139 - val_acc: 0.9360
Epoch 84/100
100/100 [==============================] - 176s 2s/step - loss: 0.0228 - acc: 0.9925 - val_loss: 0.3574 - val_acc: 0.9260
Epoch 85/100
100/100 [==============================] - 176s 2s/step - loss: 0.0265 - acc: 0.9905 - val_loss: 0.2297 - val_acc: 0.9430
Epoch 86/100
100/100 [==============================] - 176s 2s/step - loss: 0.0163 - acc: 0.9945 - val_loss: 0.0669 - val_acc: 0.9450
Epoch 87/100
100/100 [==============================] - 177s 2s/step - loss: 0.0209 - acc: 0.9905 - val_loss: 0.3930 - val_acc: 0.9410
Epoch 88/100
100/100 [==============================] - 176s 2s/step - loss: 0.0209 - acc: 0.9950 - val_loss: 0.0070 - val_acc: 0.9460
Epoch 89/100
100/100 [==============================] - 178s 2s/step - loss: 0.0202 - acc: 0.9930 - val_loss: 0.1079 - val_acc: 0.9400
Epoch 90/100
100/100 [==============================] - 176s 2s/step - loss: 0.0248 - acc: 0.9940 - val_loss: 0.1333 - val_acc: 0.9440
Epoch 91/100
100/100 [==============================] - 176s 2s/step - loss: 0.0206 - acc: 0.9910 - val_loss: 0.0231 - val_acc: 0.9320
Epoch 92/100
100/100 [==============================] - 176s 2s/step - loss: 0.0223 - acc: 0.9925 - val_loss: 1.8876 - val_acc: 0.9440
Epoch 93/100
100/100 [==============================] - 177s 2s/step - loss: 0.0215 - acc: 0.9920 - val_loss: 0.1581 - val_acc: 0.9440
Epoch 94/100
100/100 [==============================] - 176s 2s/step - loss: 0.0112 - acc: 0.9960 - val_loss: 4.2496e-04 - val_acc: 0.9380
Epoch 95/100
100/100 [==============================] - 176s 2s/step - loss: 0.0181 - acc: 0.9940 - val_loss: 0.1622 - val_acc: 0.9300
Epoch 96/100
100/100 [==============================] - 176s 2s/step - loss: 0.0164 - acc: 0.9950 - val_loss: 0.5708 - val_acc: 0.9440
Epoch 97/100
100/100 [==============================] - 176s 2s/step - loss: 0.0202 - acc: 0.9930 - val_loss: 0.1320 - val_acc: 0.9460
Epoch 98/100
100/100 [==============================] - 176s 2s/step - loss: 0.0240 - acc: 0.9915 - val_loss: 0.5840 - val_acc: 0.9330
Epoch 99/100
100/100 [==============================] - 176s 2s/step - loss: 0.0193 - acc: 0.9920 - val_loss: 0.5215 - val_acc: 0.9310
Epoch 100/100
100/100 [==============================] - 176s 2s/step - loss: 0.0297 - acc: 0.9915 - val_loss: 1.0038 - val_acc: 0.9310
In [32]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
Out[32]:
<matplotlib.legend.Legend at 0x7ff1480f0ed0>

The curves are quite noisy so lets plot an exponentially-smoothed version.

In [33]:
def smooth_curve(points, factor=0.8):
    smoothed_points = []
    for point in points:
        if smoothed_points:
            previous = smoothed_points[-1]
            smoothed_points.append(previous * factor + point * (1 - factor))
        else:
            smoothed_points.append(point)
    
    return smoothed_points

plt.plot(epochs, smooth_curve(acc), 'bo', label='Smoothed training acc')
plt.plot(epochs, smooth_curve(val_acc), 'b', label='Smoothed validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, smooth_curve(loss), 'bo', label='Smoothed training loss')
plt.plot(epochs, smooth_curve(val_loss), 'b', label='Smoothed validation loss')
plt.title('Training and validation loss')
plt.legend()
Out[33]:
<matplotlib.legend.Legend at 0x7ff124f970d0>

The accuracy of the model appears to top out at ~94%. The Exponentially weighted moving average of the loss function is increasing, which is paradoxical as the accuracy is improving. It appears accuracy relies more on the distribution of the loss values rather than their average.

Let's looks at the performance of the model on the test data.

In [34]:
test_generator = test_datagen.flow_from_directory(test_dir, target_size=(150, 150), 
                                                  batch_size=20, class_mode='binary')

test_loss, test_acc = model.evaluate_generator(test_generator, steps=50)
print('test acc:', test_acc)
Found 1000 images belonging to 2 classes.
test acc: 0.9380000233650208
In [35]:
model.save('cats_and_dogs_pretrain.h5')

The model achieves 94% performance on the test data.

7. Visualizing the Covnets

We can visualize the models in three ways:

  1. Visualize intermidiate covnet outputs (intermediate activations)
  2. Visualize the covnet filters
  3. Visualize heatmaps of class activation in an image

7.1. Visualize intermiediate activations

In [36]:
from keras.models import load_model

model = load_model('cats_and_dogs_small_2.h5')
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_5 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6272)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 512)               3211776   
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

We then select an input image to visualize.

In [37]:
img_path = test_dir + '/cats/cat.1700.jpg'

from keras.preprocessing import image

img = image.load_img(img_path, target_size=(150,150))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.

print(img_tensor.shape)

plt.imshow(img_tensor[0])
(1, 150, 150, 3)
Out[37]:
<matplotlib.image.AxesImage at 0x7ff1485e9650>

In order to extract the feature maps, we create a model that takes batches of images as input and outputs the activations of all convolutional and pooling layers.

In [38]:
layer_outputs = [layer.output for layer in model.layers[:8]]  # extract output of first eight layers
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

This model has one input and eight outputs: one output per layer activation. Let's plot the fourth and seventh channels of the first layer's activation.

In [39]:
activations = activation_model.predict(img_tensor)
first_layer_activation = activations[0]
print(first_layer_activation.shape)

plt.matshow(first_layer_activation[0, :, :, 4], cmap='viridis')
plt.figure()
plt.matshow(first_layer_activation[0, :, :, 7], cmap='viridis')
(1, 148, 148, 32)
Out[39]:
<matplotlib.image.AxesImage at 0x7ff125cba690>
<Figure size 432x288 with 0 Axes>

Let's extract and plot every channel in each of the eight activation maps.

In [40]:
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)
    
images_per_row = 16

for layer_name, layer_activation in zip(layer_names, activations):
    n_features = layer_activation.shape[-1]  # i.e. the number of channels in a layer
    
    size = layer_activation.shape[1]  # shape is (n, size, size, n_features / n_channels)
    
    n_rows = n_features // images_per_row
    display_grid = np.zeros((size * n_rows, images_per_row * size))
    
    for row in range(n_rows):
        for col in range(images_per_row):
            channel_image = layer_activation[0, :, :, row * images_per_row + col]
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[(row * size):((row + 1) * size), (col * size):((col + 1) * size)] = channel_image
            
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1], scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect = 'auto', cmap='viridis')
/home/d869321/anaconda3/envs/neural/lib/python3.7/site-packages/ipykernel_launcher.py:19: RuntimeWarning: invalid value encountered in true_divide

The first layer acts as a collection of edge detectors. As we go deeper, the activations carry less and less visual information about the input and more and more information about the target. Hence the later layers are more abstract. In addition, the activations become more sparse as we go deeper. Many of filters corresponding pattern is not found in the input image and hence there is zero activation.

7.2. Visualizing convet filters

Here, we display the visual pattern that each filter is meant to respond to. We use optimization on a blank image to find the image that yields the maximum response to the filter.

7.2.1. Building blocks for deriving image with maximal filter response

Let's define the loss.

In [41]:
from keras import backend as K

model = VGG16(weights='imagenet', include_top=False)

layer_name = 'block3_conv1'
filter_index = 0

layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])

We find the gradient of this loss with respect to the input.

In [42]:
grads = K.gradients(loss, model.input)[0]
# ensures magnitude of update to input image is in same range
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)  # add small number to avoid divide-by-zero

We define a function iterate that computes the loss and gradient given an input image.

In [43]:
iterate = K.function([model.input], [loss, grads])
loss_value, grads_value = iterate([np.zeros((1, 150, 150, 3))])

We can then apply gradient descent using the loss and gradient values.

In [44]:
input_img_data = np.random.random((1, 150, 150, 3)) * 20 + 128  # gray image with noise

step = 1.
for i in range(40):
    loss_value, grads_value = iterate([input_img_data])
    input_img_data += grads_value * step

We need to process the image to ensure it lies in the range [0, 255].

In [45]:
def deprocess_image(x):
    x -= x.mean()
    # ensure std dev is 0.1
    x /= (x.std() + 1e-5)
    x *= 0.1
    
    x += 0.5
    x = np.clip(x, 0, 1)
    
    x *= 255
    x = np.clip(x, 0, 255).astype('uint8')
    return x

We can now define a function that finds the image with maximal response to any filter.

In [46]:
def generate_pattern(layer_name, filter_index, size=150):
    layer_output = model.get_layer(layer_name).output
    loss = K.mean(layer_output[:, :, :, filter_index])  # loss maximizes activation for chosen filter
    
    grads = K.gradients(loss, model.input)[0]  # computes gradient of input image
    grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)
    
    iterate = K.function([model.input], [loss, grads])  # returns loss, grad given input image
    
    input_img_data = np.random.random((1, size, size, 3)) * 20 + 128
    
    steps = 1
    for i in range(40):
        loss_value, grads_value = iterate([input_img_data])
        input_img_data += grads_value * step
        
    img = input_img_data[0]
    return deprocess_image(img)

plt.imshow(generate_pattern('block3_conv1', 0))
Out[46]:
<matplotlib.image.AxesImage at 0x7ff1243e4a50>

We can now look at the first 64 filters for the first layer in each convolution block: block1_conv1, block2_conv, block3_conv1, block4_conv1 and block5_conv1.

In [47]:
layer_names = ['block1_conv1']  #, 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1']
size = 150
margin = 5

for layer_name in layer_names:
    results = np.zeros((8 * size + 7 * margin, 8 * size + 7 * margin, 3))  # empty image to store results

    for i in range(8):
        for j in range(8):
            filter_img = generate_pattern(layer_name, i + (j * 8), size=size)
            horizontal_start = i * size + i * margin
            horizontal_end = horizontal_start + size
            vertical_start = j * size + j * margin
            vertical_end = vertical_start + size
            results[horizontal_start: horizontal_end, vertical_start: vertical_end, :] = filter_img
        
    plt.figure(figsize=(20, 20))
    plt.imshow(results)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
In [48]:
layer_names = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1']

for name in layer_names:
    plt.figure()
    plt.imshow(generate_pattern(name, 0))

We see that the filters from the first layer encode simple directional edges. Th second layer encodes textures and the higher layers have higher features such as leaves and eyes.

7.3. Visualizing heatmaps of class activation

This is used to understand what parts of a given image led the network to its classification. This can be useful for understanding misbehaviour by the classifier. We can also locate specific objects in an image.

Class Activation Maps (CAMs) indicate how important each location is for the class under consideration. For example, we can look how cat-like different parts of the image are. It takes the output feature map of a convolution layer given an input image and weights every channel in that feature map by the gradient of the class with respect to the channel. That is it weights how intensely the input image activates different channels by how important each channel is with regard to the class.

In [50]:
from keras.applications.vgg16 import preprocess_input, decode_predictions

model = VGG16(weights='imagenet')

img_path = base_dir + '/creative_commons_elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))

x = image.img_to_array(img)  # numpy array (224, 224, 3)
x = np.expand_dims(x, axis=0)  # adds dimension so size is (1, 224, 224, 3)
x = preprocess_input(x)

Let us see the image.

In [51]:
img_path = base_dir + '/creative_commons_elephant.jpg'

img = image.load_img(img_path, target_size=(600,899))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.

print(img_tensor.shape)

plt.imshow(img_tensor[0])
(1, 600, 899, 3)
Out[51]:
<matplotlib.image.AxesImage at 0x7ff10dc5c2d0>

We can then predict the class.

In [52]:
preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=5)[0])
print('Predicted class index:', np.argmax(preds[0]))
Predicted: [('n02504458', 'African_elephant', 0.8721392), ('n01871265', 'tusker', 0.11614813), ('n02504013', 'Indian_elephant', 0.011627928), ('n02408429', 'water_buffalo', 8.415691e-05), ('n02397096', 'warthog', 2.63688e-07)]
Predicted class index: 386

The class index with maximimum probability is 386.

We can visualize which parts of the image are the most elephant-like.

In [53]:
african_elephant_output = model.output[:, 386]

last_conv_layer = model.get_layer('block5_conv3')  # last convolutional layer in VGG16
# gradient of elephant class wrt to output feature map block5_conv3
grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]
# vector of shape (512, ) where each entry is mean intensity of gradient over a specific feature-map channel
# note the last feature map has 512 channels
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])

pooled_grads_value, conv_layer_output_value = iterate([x])  # calcualte grad, output given input
# multiply each channel in feature map by how important the channel is to elephant class
for i in range(512):
    conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
    
heatmap = np.mean(conv_layer_output_value, axis=-1)  # channel-wise mean of resulting feature map
heamap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.matshow(heatmap)
Out[53]:
<matplotlib.image.AxesImage at 0x7ff10d80e310>

Finally, we can use OpenCV to generate an image that superimposes the original image on the heatmap.

In [54]:
import cv2

img = cv2.imread(img_path)
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))  # resizes heatmap to be same size as input
heatmap = np.uint8(255 * heatmap)  # converts heatmap to RGB
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)  # applies heatmap to original image
superimposed_img = heatmap * 0.4 + img

cv2.imwrite(base_dir + '/elephant_cam.jpg', superimposed_img)
Out[54]:
True

We can then plot the written image.

In [55]:
img_path = base_dir + '/elephant_cam.jpg'

img = image.load_img(img_path, target_size=(600,899))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.

print(img_tensor.shape)

plt.imshow(img_tensor[0])
(1, 600, 899, 3)
Out[55]:
<matplotlib.image.AxesImage at 0x7ff0c4088790>
In [ ]: